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The complete nucleotide sequences of the hemagglutinin/esterase (HE) genes of human coronavirus (HCV) strain 
OC43 and bovine respiratory coronavirus (BRCV) strain G95 were determined from single-stranded cDNA fragments 
generated by reverse transcription of virus-specific mRNAs and amplified by polymerase chain reaction. An open 
reading frame of 1272 nucleotides was identified as the putative HE gene by homology to the bovine coronavirus HE 
gene. This open reading frame encodes a protein of 424 amino acids with an estimated molecular weight of 47.7 kDa. 
Ten potential N-linked glycosylation sites were predicted in the HE protein of HCV-OC43 while nine of them were 
present in BRCV-G95. Fourteen cysteine residues were conserved in the HE proteins of both viruses. Two hydrophobic 
sequences at the N-terminus and the C-terminus may Serve as signal peptide and transmembrane anchoring domain, 
respectively. The predicted HE protein of HCV-OC43 was 95% identical to the HEs of BRCV-G95 and other bovine 
coronaviruses, and 60% identical to the HEs of mouse hepatitis viruses. Phylogenetic analysis suggests that the HE 
genes of coronaviruses and influenza C virus have a common ancestral origin, and that bovine coronaviruses and 


HCV-0C43 are closely related. © 1992 Academic Press, Inc. 


Coronaviruses possess a single-stranded, nonseg- 
mented RNA genome of positive polarity (7, 2) and are 
associated with a variety of diseases in man and ani- 
mals (3-5}. Coronaviruses are divided into two major 
antigenic groups. The first group includes human cor- 
onavirus strain OC 43 (HCV-OC43), bovine coronavirus 
(BCV), mouse hepatitis virus (MHV), and hemaggluti- 
nating encephalomyelitis virus of swine (HEV) (2, 5). 
HCV-OC43 causes respiratory infection of man similar 
to those of influenza viruses (6). BCV causes enteritis 
of newborn calves and is also considered to be an etio- 
logical factor of respiratory diseases of calves (7, 8). 
MHV can infect different organs, causing enteric, respi- 
ratory, and neurological diseases (5, 9). A unique prop- 
erty of coronaviruses within this antigenic cluster is the 
presence of the hemagglutinin/esterase (HE) gene. 
The genome of MHV-A59 contains an open reading 
frame (ORF), which may code for an HE protein. How- 
ever, the HE is not expressed in infected cells (70, 77, 
23). The second group, which includes HCV strain 
229E, avian infectious bronchitis virus (IBV), porcine 
transmissible gastroenteritis virus, and feline  in- 
fectious peritonitis virus, lacks an HE gene (2, 5). 

The HE glycoproteins of HCV-OC43, BCV, MHV- 
JHM, and HEV possess hemagglutination (receptor- 
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binding) and acetylesterase (receptor-destroying) activ- 
ities similar to the HE (or HEF) glycoprotein of influenza 
C virus (ICV) (72-20). It was shown that the HE glyco- 
protein of BCV strain Quebec induces neutralizing anti- 
bodies both /n vitro and in vivo and thus, is important in 
viral infectivity (27, 22). It is evidently not required for 
viral infectivity in MHV-A59 and MHV-JHM (77). The 
role of the HE gene and its protein in coronavirus evo- 
lution, replication, and pathogenesis remains unclear. 

The exact genomic organization of HCV-OC43 is not 
known. Antigenic and nucleic acid hybridization stud- 
ies indicate that the HCV-OC43 is closely related to 
BCV (23-25). By analogy to BCV, the order of the 
genes coding for the structural proteins probably is 5’- 
HE-S-M-N-3’. Recently, the N gene of HCV-OC43 was 
sequenced, and it was found to be similar to BCV N 
gene (97.5% amino acid sequence homology) (26). 
The origin and evolutionary relationships among the 
HE genes of hemagglutinating coronaviruses isolated 
from different species are poorly understood. To eluci- 
date the molecular evolution of the coronavirus HE 
genes, we sequenced the HE genes of HCV-OC43, a 
bovine respiratory coronavirus (BRCV), a virulent, and 
an avirulent BCV strains. We report here the complete 
nucleotide sequence of the HE genes of HCV-OC43 
and BRCV-G95, and their phylogenetic relatedness to 
BCVs, MHV, and ICV. 

HCV-OC43 was obtained from the American Type 
Culture Collection (ATCC, 759-VR) and propagated in 
human rectal tumor (HRT-18) cells as described previ- 
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ously (27). A bovine respiratory coronavirus strain 
Giessen 89-4595 (G95) was kindly provided by Dr. W. 
Herbst, Institute of Hygiene and Infectious Diseases of 
Animals, Justus Liebig University Giessen, Germany. 
This virus was originally isolated from nasal swabs of a 
calf suffering from respiratory disorder, and propa- 
gated in HRT-18 cells. Isolation and purification of viral 
RNA, cDNA synthesis, double-stranded (ds) cDNA am- 
plification and single-stranded (ss) cDNA production by 
polymerase chain reaction (PCR), as well as DNA se- 
quencing were performed as described previously 
(28, 44). 

Primers were designed to generate cDNA fragments 
from virus-specific mRNAs by reverse transcription and 
PCR amplification, based on the high degree of geno- 
mic similarity between HCV-OC43 and BCVs (25, 26). 
These primers were previously used for amplification 
and sequencing of BCV S and HE genes (28, 44). PCR- 
generated cDNA fragments were directly sequenced 
in both directions. Analysis of the sequences revealed 
that a large ORF of 1272 nucleotides was identical in 
size to the HE genes of BCVs (29, 30, 44). This ORF 
terminated 14 nucleotides upstream from the S gene 
(Zhang et a/., unpublished data), and encoded a protein 
of 424 amino acid residues with an estimated molecu- 
lar weight of 47.7 kDa (Figs. 1 and 2). Two identical 
sequences (CTAAAC), similar to the consensus inter- 
genic sequence upstream of the HCV-OC43 N gene 
(CTAAAT) (26) and identical to the consensus se- 
quence upstream of BCV HE and S genes, were found 
16 nt upstream of the predicted initiation codon (at nu- 
cleotides 16 to 18) for the HE protein and 8 nt down- 
stream from the termination codon, respectively. Hy- 
dropathic analysis of the predicted amino acid se- 
quence indicated that the putative HE _ protein 
possessed the characteristics of a membrane protein. 
Specifically, a hydrophobic stretch of 15 amino acids at 
the N-terminus may serve as signal peptide with a 
cleavage site between amino acids 15 and 16 (30, 37, 
44). Another hydrophobic amino acid sequence near 
the C-terminus (amino acids 389 to 414) may serve as 
the transmembrane domain anchoring the protein in 
the viral envelope. A hydrophilic sequence of 10 amino 
acids at the C-terminus may serve as an intravirion-do- 
main. Ten potential N-linked glycosylation sites were 
predicted in the HE protein of HCV-OC43 while nine of 
them were present in that of BRCV-G95. Two internal 
ORFs were predicted within the large ORF extending 
from nt 107 to 517 and from 976 to 1228. By analogy to 
BCV HE gene, these results suggest that the predicted 
large ORF 2b represents mRNA 2-1 of HCV-OC43 and 
BRCV-G95, encoding the HE glycoprotein. 

The predicted amino acid sequences of the HE 
genes from HCV-OC43 and BRCV-G95 (Fig. 1), BCV- 


Mebus (30), BCV-Quebec (29), BCV-L9, BCV-LY138 
(44), MHV-A59 (70), MHV-JHM (77) were aligned using 
the programs of the University of Wisconsin Genetic 
Computer Group software package, version 6.1. The 
alignment revealed that the HE gene of HCV-0C43 
was more closely related to BRCV-95 and BCVs than to 
MHVs. Nucleotide and amino acid sequences among 
HCV-0C43, BRCV-G95, and BCVs were 95.8 to 96.3% 
and 94.1 to 94.8% identical, respectively, while the 
amino acid sequence identity between HCV-OC43 and 
MHVs was approximately 60%. Fourteen cysteine resi- 
dues were strictly conserved in the HE proteins of 
HCV-0C43, BRCV-G95, BCVs, and MHVs. The MHV- 
JHM had 15 amino acids and two cysteine residues 
more than HCV-OC43 and BRCV-G95. The alignment 
indicated that the eight HE genes among coronavi- 
ruses can be divided into two groups. The first group 
includes HCV-OC43, BRCV-G95, and all BCVs, and the 
second group includes MHV-JHM and MHV-A59. 

To identify a possible evolutionary pathway for the 
HE gene of coronaviruses, we compared the corona- 
virus HE genes with the ICV HE gene. An alignment of 
the predicted amino acid sequences is shown in Fig. 2. 
In this alignment, the ICV HE1 subunit shows a se- 
quence identity of approximate 28.2% with the HE pro- 
tein of HCV-OC43, BRCV-G95, and BCVs, and 26.3% 
with the HE protein of MHVs. The alignment shows 
that several regions are completely identical. Most im- 
portantly, the putative acetylesterase active site (F-G- 
D-S) (at amino acids 72 to 75 in Fig. 2) is conserved in 
all HE proteins of human, bovine, and murine corona- 
viruses and ICV. Ten of the 14 cysteine residue posi- 
tions of HCV-OC43 are conserved among all HE pro- 
teins compared. These data suggest that these pro- 
teins may be evolutionarily related to each other. 

DNA sequences for each gene were optimally 
aligned based on the alignment of their respective 
amino acid sequences (Fig. 2). A maximum parsimony 
analysis was performed on the aligned DNA se- 
quences to predict possible phylogenetic relationships 
among coronaviruses (detailed methodology for the 
phylogenetic, computer-assisted analysis is described 
in the legend of Fig. 3). Cladistic analysis of the DNA 
sequence data resulted in a single phylogenetic hy- 
pothesis (phylogram) with a total length of 1503 steps 
and a rescaled consistency index of 0.962. This analy- 
Sis suggested that all coronaviruses were divided in 
two clades. One clade included HCV-OC43, BRCV- 
G95, and BCVs. The other clade consisted of MHV- 
JHM and MHV-A59Q. Neither coronaviral clade is de- 
rived from the other. Within the clades, all BCVs were 
closely related taxa to HCV-OC43, and the MHV-JHM 
and MHV-A59 were sister taxa. The phylogram sug- 
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Fic. 1. Nucleotide and deduced amino acid sequences of the HE gene of HCV-OC43 and BRCV-G95. The primers 3'S95 (5- 
ATGGAAACCGTAGTAGTACACTT-3) and 5'HE (5’-GGTTTTATGAATCTCCAGTTGAA-3) were used for cDNA synthesis and PCR amplification. 
DNA sequencing was carried out with the modified dideoxynucleotide chain termination procedure (43) using Sequenase (USB, Cleveland, OH). 
Sequences were analyzed with the aid of the Sequence Analysis Software Package of the Genetics Computer Group of the University of 
Wisconsin and the MacVector Software (IBI, New Haven, CT). The consensus sequences (CTAAAC) are indicated by asterisks. The predicted 
signal peptide and intramembrane anchoring sequences are marked by (~). Ten potential N-linked glycosylation sites are underlined. The 
putative acetylesterase active site (FGDS) is shown in boldcase. The amino acid sequences of two possible internal ORFs are displayed under 
the amino acid sequence of the large ORF. The nucleotide sequences of the HE gene of BRCV-G95, which differ from those of HCV-OC43, are 


SHORT COMMUNICATIONS 


G Tc GA 
CTAAACTCAGTGAAAATGTTTTTGCTTCCTAGATTTATTCTAGTTAGCTGCATAATTGGTAGCTTAGGTTTTTACAACCCTCCTACCAATGTIGTTTCGC 
RARE ER MetPheLeuLeuProArgPhelleLeuValSerCysIleIleGlySerLeuGlyPheTyrAsnProProThrAsnValValSerH 

T G Cc G 


ATGTAAATGGAGATTGGTTTTTATTTGGTGACAGTCGTTCAGATTGTAATCATATTGTTAATATCAACCCCCATAATTATTCTTATATGGACCTTAATCC 
isValAsnGlyAspTrpPheLeuPheGlyAspSerArgSerAspCysAsnHisIleValAsnIleAsnProHisAsnTyrSerTyrMetAspLeuAsnPr 
IORF1 MetGluIleGlyPheTyrLeuValThrValValGlnIlevallleIleLeuLeuIleSerThrProIlelleIleLeuIleTrpThrLeulleL 


ce 
TGTTCTGTGTGATTCTGGTAAAATATCATCTAAAGCTGGCAACTCCATTTTTAGGAGTTITCACTTTACCGATTTTTATAATTACACAGGCGAAGGTCAA 
oValLeuCysAspSerGlyLysIleSerSerLysAlaGlyAsnSerIlePheArgSerPheHisPheThrAspPheTyrAsnTyrThrGlyGluGlyGln 
euPheCysValIleLeuValLysTyrHisLeuLysLeuAlaThrProPheLeuGlyValPheThrLeuProlIlePheIleIleThrGlnAlaLysValAs 


Cc AC 
CAAATTATTTTTTATGAGGGTGTTAATTTTACGCCTTATCATGCCTTTAAATGCAACCGTTCTGGTAGTAATGATATTTGGATGCAGAATAAAGGCTTGT 
GlnileIlePheTyrGluGlyValAsnPheThrProTyrHisAlaPheLysCysAsnArgSerGlySerAsnAspIleTrpMetGlnAsnLysGlyLeuP 
nLysLeuPhePheMetArgValLeuIleLeuArgLeulleMet ProLeuAsnAlaThrValLeuValValMetIlePheGlyCysArglleLysAlaCys 


c C T T 
TTTATACTCAGGTTTATAAGAATATGGCTGTGTATCGCAGCCTTACTTITGTTAATGTACCATATGTTTATAATGGCTCCGCACAAGCTACAGCTCTTTG 
heTyrThrGlnValTyrLysAsnMetAlaValTyrArgSerLeuThrPheValAsnValProTyrValTyrAsnGlySerAlaGlnAlaThrAlaLeucy 
PheI leLeuArgPheI leArgI leTrpLeuCysIleAlaAlaLeuLeuLeuLeuMet Ty rHi sMet Phel leMetAlaProHisLysLeuGlnLeuPheV 


T GG TT 
TAAATCTGGTAGTTTAGTCCTTAATAACCCTGCATATATAGCTCCTCAAGCTAACTCTGGGGATTATTATTATAAGGTTGAAGCTGATTTTTATTTGTCA 
sLysSerGlySerLeuValLeuAsnAsnProAlaTyrI]leAlaProGlnAlaAsnSerGlyAspTyrTyrTyrLysValGluAlaAspPheTyrLeuSer 
alAsnLeuValVal-—— 


GGTTGTGACGAGTATATCGTACCACTTTGTATTTTTAACGGCAAGTTTTTGTCGAATACAAAGTATTATGATGATAGTCAATATTATTTTAATAAAGACA 
GlyCysAspGluTyrIleValProLeuCysIlePheAsnGlyLysPheLeuSerAsnThrLysTyrTy rAspAspSerGlnTyrTyrPheAsnLysAspT 


T T c c Cc 
CTGGTGTTATTTATGGTCTCAATTCTACAGAAACCATTACCACTGGTTTTGATCTTAATTGTTATTATTTAGTTTTACCCTCTGGTAATTATTTAGCCAT 
hrGlyValIleTyrGlyLeuAsnSerThrGluThriIleThrThrGlyPheAspLeuAsnCysTy rTyrLeuValLeuProSerGlyAsnTyrLeuAlalIl 


T c 
TTCAAATGAGCTATTGTTAACTGTTCCTACGAAAGCAATCTGTCITAATAAGCGTAAGGATTTTACGCCTGTACAGGTTGTTGATTCGCGGTGGAACAAT 
eSerAsnGluLeuLeuLeuThrVal ProThrLysAlaI leCysLeuAsnLysArgLysAspPheThrProValGlnValValAspSerArgTrpAsnAsn 


A Cc c Cc 
GCCAGGCAGTCTGATAACATGACGGCGGTTGCTITGTCAACCTCCGTACTGTTATTTTCGTAATTCTACTACCAACTATGTTGGTGTITATGATATTAATC 
AlaArgGlnSerAspAsmMetThrAlaValAlacysGlnProProfyrCcysTy:PheArgAsnse rThrThrAsnTyrValGlyValTyrAspIleAsnH 

IORF2 MetLeuValPheMetIleLeulle 


G Cc G A T A 
ATGGAGATGCTGGTTTTACTAGCATACTTAGTGGTTTGTTATATAATTCACCTTGTTTTTCGCAGCAAGGCGTTTTTAGGTATGATAATGTTAGCAGTGT 
isGlyAspAlaGlyPheThrSerIleLeuSerGlyLeuLeuTy rAsnSerProCysPheSerGlnGlnGlyVal PheArgTyrAspAsnValSerServVa 
MetGluMetLeuValLeuLeuAlaTyrLeuValValCysTyrIleI] eHisLeuValPheArgSerLysAlaPheLeuGlyMetI leMetLeuAlavals 


T G T cc G T Cc 

CTGGCCTCTCTACCCCTATGGCAGATGTCCCACTGCTGCTGATATTAATATCCCTGATTTACCCATTTGIGTGTATGATCCGCTACCAGTTATTTTGCTT 
1frpProLeuTyrProTyrGlyArgCysProThrAlaAlaAspI leAsnIleProAspLeuProl leCysValTyrAspProLeuProValileLeuLeu 
erGlyLeuSerThrProMetAl aAspVal ProLeuLeuLeuI leLeulIleSerLeuI leTyrProPheValCysMetIleArgTyrGlnLeuPheCysLe 


T G CA 
GGCATTCTTTTGGGCGTTGCGATTIGTAATTATTGTAGTTTTGTTGTTATATTTTATGGTGGATAATGGTACTAGGCTGCATGATGCTTAGACCATAATCT 
GlyIleLeuLeuGlyValAlaIleValIleIlevalValLeuLeuLeuTyrPheMet ValAspAsnGlyThrArgLeuHisAspAla——— shal 


uAlaPhePheTrpAlaLeuArgLeu-—— 


AAAC 1304 


ka 


displayed above the nucleotide sequence of HCV-OC43. 
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Fic. 2. Comparison of the amino acid sequences of the HE proteins among human, bovine, and murine coronaviruses and influenza C virus. 
The deduced amino acid sequences were aligned manually. The conserved cysteine residues and the putative esterase active sites (FGDS) are 
marked by asterisks on the top. Dash (-) indicates amino acids identical to the first sequence. Dot (-) denotes gaps introduced to the sequences 
for maximum alignment. The sequence for BCV-Mebus, BCV-Quebec, MHV-JHM, MHV-A89Q, ICV were obtained from Refs. 30, 29, 17, and 10, 
respectively. The sequences for BCV-L9, BCV-LY138 were obtained from recent work (44). 


gested a common ancestor of this antigenic group of 
coronaviruses. The highly cell-adapted strains BCV- 
L9, BCV-Quebec, and BCV-Mebus are closely related 
to the wild-types BCV-LY138 and BRCV-GQ95. We ex- 
cluded these strains in the final phylogenetic analysis, 
because their close relationships resulted in collapsed 
branches in the phylogenetic tree. We further at- 
tempted to analyze the relationship among the HE 
genes of selected coronaviruses and ICV based on 
these results using limited DNA sequence information, 


and assuming ICV as outgroup. The highly variable re- 
gions (positions 19-185, 235-365, 451-1046, 1255- 
1295, 1351-1430, and 1459-1503) were excluded 
because they were not aligned with confidence. We 
identified 125 phylogenetically informative sites (vari- 
able sites with at least two taxa potentially sharing a 
derived base) from 439 aligned base positions. The 
phylogram shows an almost identical topology for the 
coronaviral ingroup obtained by the previous analysis 
(Fig. 3). 
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BCV-LY138 
BRCV-G95 
HCV-0C43 
MHV-A59 
MHV-JHM 


ICV 


Fia. 3. Phylogenetic tree for the HE gene nucleotide sequences of 
human, bovine, and murine coronaviruses and ICV. The complete 
HE gene sequences of HCV-0C43, BRCV-G95, MHV-JHM (77), 
MHV-Ab59 (70), and BCV-LY138 (44), and a partial HE gene se- 
quence of ICV (32) were used for this phylogram. The DNA se- 
quences were aligned based on their deduced amino acid sequence 
alignment as shown in Fig. 2. The data were analyzed with the soft- 
ware package PAUP (Phylogenetic Analysis Using Parsimony, ver- 
sion 3.0g, D. Swofford, National Historical Survey, IL) using the 
MULPARS heuristic search option with tree bisection-reconnection 
(TBR) branch-swapping. From the alignment, 462 phylogenetically 
informative sites were identified from 1503 aligned base positions 
per taxon. Horizontal distance (branch-length) is proportional to the 
number of inferred character changes. Vertical lines are for spacing 
branches and labels. 


Since the HCV-OC43 and ICV infect similar tissues 
in human subjects, the significant sequence homology 
between the HE genes of the two viruses suggests that 
coinfection of an ancestral coronavirus with ICV fol- 
lowed by recombination may have given rise to HCV- 
O0C43. This was also proposed by Luytjes et a/. (70). 
Phylogenetic analysis also suggests that the HE genes 
of coronaviruses and ICV may originate from a com- 
mon ancester. It is worth noting that the HE protein of 
ICV contains receptor-binding, acetylesterase and fu- 
sion activities while that of coronaviruses contains only 
the receptor-binding and acetylesterase activities. The 
fusion function of ICV is associated with the V-terminal 
hydrophobic region of the HE2 subunit of the HE pro- 
tein(77-19, 32). Asimilar hydrophobic domain was not 
found in the coronavirus HE protein. 

The high similarity between the HE proteins of HCV- 
OC43 and BCVs (95% identity on the average) sug- 
gests that both viruses are very closely related. This 
hypothesis is also supported by the tree branch dis- 
tance in the phylogenetic analysis shown in Fig. 3. in- 
terestingly, the HE of HCV-OC43 is more closely re- 
lated to those of BRCV-G95 and the wild-type, virulent 
strain BCV-LY138 than to that of the cell-culture 
adapted avirulent strain BCV-L9. The wild-type strain 
BCV-LY138 does not replicate in numerous bovine 


cells in vitro, but it grows readily in human cells (HRT- 
18) without requiring prior adaptation (27, 33, 34). 
Since these polarized human cells retain many fea- 
tures of primary epithelial cells, infection by BCV sug- 
gests that BCV may also infect humans, and therefore, 
it is a zoonotic virus (23, 35). 

We previously reported a case of human diarrhea 
caused by BCV-LY138, in which the virus was identi- 
fied from the infected patient (35). Recently, we found 
that BRCV-G95 exhibited almost identical cytopathol- 
ogy in vitro to the wild-type virulent strains BCV-LY138 
(data not shown). The HCV-OC43, BCVs, and BRCV- 
G95 differed only in few amino acids in the HE, and 
their putative acetylesterase active sites were con- 
served (see Fig. 2). O-acetylneuraminic acid was 
shown to be the major determinant for ICV (36-38). 
HCV-OC43, BCVs, and BRCV-G95 probably recognize 
this receptor on the surface of many different epithelial 
cells. They may be able to replicate in epithelial cells of 
both respiratory and intestinal tracts, and to cross spe- 
cies-barriers Causing diseases in heterologous hosts. 
However, HCV-OC43 primarily causes respiratory dis- 
eases and BCVs cause enteritis. The ability of these 
viruses to replicate in different organs and to cause 
different clinical symptoms is probably due to multiple 
amino acid differences occurring within several viral 
proteins. The S protein of MHV was shown to be im- 
portant in tissue tropism (39). 

Recently, it was reported that turkey enteric corona- 
virus is antigenically and genomically closely related to 
BCVs (40), and similar functions were found in the HE 
protein of HEV (20). Whether swines or turkeys may 
also serve as reservoir (mixing-vessel) for coronavirus 
recombination in nature, as it was proposed for influ- 
enza A viruses (47), remains to be elucidated. It is 
worth noting that ICV was also isolated from pigs (42). 
It will be worthwhile to compare the HE genes among 
these coronaviruses. Comparison of the remaining 
genes with HCV-OC43 and BCVs will provide further 
insight into their evolution and host cell tropism. 
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